Submitted to ApJ, October 11, 2012 

Preprint typeset using K^'T^]X style emulatcapj v. 5/2/11 



OPTIMAL CORRELATION ESTIMATORS FOR QUANTIZED SIGNALS 
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ABSTRACT 

Using a maximum-likelihood criterion, we derive optimal correlation strategies for signals with and 
without digitization. We assume that the signals are drawn from zero-mean Gaussian distributions, 
as is expected in radio-astronomical applications, and we present correlation estimators both with and 
without a priori knowledge of the signal variances. We demonstrate that traditional estimators of 
correlation, which rely on averaging products, exhibit large and paradoxical noise when the correlation 
is strong. However, we also show that these estimators are fully optimal in the limit of vanishing 
correlation. We calculate the bias and noise in each of these estimators and discuss their suitability 
for implementation in modern digital correlators. 

Subject headings: methods: data analysis ~ methods: statistical - techniques: interferometric 



1. INTRODUCTION 

Because astrophysical sources emit Gaussian noise, the 
information in astrophysical observations lies in signal 
covariances. These are familia r as power spectra an d 
cross spectra (see, for example. [Thompson et aLll2001h . 
The estimation of these covariances is subject to bias 
and noise, and techniques to minimize both are therefore 
fundamental to radio astronomy. 

This estimation is complicated by the typically aggres- 
sive quantization of the received signal. Even for next- 
generation phased arrays, such as the Square-Kilometer 
Array, the cost o f signal transmission necessitates low- 
bit quantization (jPewdney et al.l[2009[ ). This procedure 
distorts the spectrum but preserves much of the underly- 
ing statistical information. In fact, several authors have 
noted the ability of quantization to improve estimates of 
correlati on p 6 J— 1,1 1, especial ly for strong correlation 
\p\ 1 (|Gwinnll2nn4h . Indeed, fCoia (pM ) found that 
the standard two-level correlation scheme has lower noise 
than four- and six- level schemes in this limit, and that 
all of these estimates have lower noise than the correla- 
tion estimates for unquantized signals. We demonstrate 
that this paradoxical behavior arises from two causes: 
comparisons with unquantized correlation estimates are 
incomplete, and typical quantization schemes are not op- 
timal. 

To amend these deficiencies, we present a correla- 
tion estimator for unquantized data that is appropri- 
ately suited to define a quantization efficiency, and we 
derive optimal correlation estimators for quantized sig- 
nals via a maximum-likelihood criterion. With the recent 
advent of digital correla tors in radio astronomy, such as 
DiFX (jDeller et al.ll2007l ) , implementing these techniques 
is straightforward. 

1.1. Terminology and Notation 

In spite of the many treatments of quantized corre- 
lation, no standard terminology has been adopted, so 
we first outline some basic assumptions and definitions. 
Throughout this work, we use {xi,yi\ to designate sets 



of pairs independently drawn from a zero-mean bivari- 
ate Gaussian distribution. For simplicity, we will assume 
that the standard deviations <t^ and CTy are unity. 

We denote ensemble averages by unsubscripted angular 
brackets (...). We will make use of the correlation p = 



{xy) / (a^Uy) and the 



covariance 



(xy)- 



We denote finite averages, over a sample of N points, 
by subscripted angular brackets For example, 

we frequently use the sample covariance r^o = {xy) n — 

^iUi- We also use this terminology to refer to 
an average product after quantization. Because most ap- 
plications of correlation in radio astronomy involve many 
samples N, we focus on the large-A^ regime. 

Our work focuses on estimators of p, given a set of N 
samples {xi, yi}, possibly after quantization. We use the 
variable r, with subscripted identifiers, to indicate such 
estimators. Finally, we generically use P{. . .) to denote 
a probability density function (PDF) with respect to the 
given variables and parameters. 

1.2. Relation to Previous Work 

Previous analyses of quantized correlation have as- 
sumed that the correlation should be estimated via a 
form of sample covariance for the quantized signals; they 
have then optimized the performance of the correlation 
by choosing an appropriate quantization scheme. Fur- 
thermore, these efforts generally focus on the small cor- 
relation regime : \p\ ^ 1. 

For example, iJenet fc Andersoiil (|1998[ ) provide an ap- 
proximate prescription for correcting the bias from quan- 
tization in sample covariance. However, this prescription 
still suffers from severely sub-optimal performance when 
p 7^ 0, in terms of the noise. 

In contrast, we provide a new mechanism for estimat- 
ing correlation and demonstrate that it provides the low- 
est RMS error of any post-quantization correlation strat- 
egy for a large number of samples. We also demonstrate 
this this strategy is equivalent to traditional approaches 
as p — >■ 0, and we give a rigorous justification for the 
optimal weights that are typically quoted. 
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In ^ we briefly review the basic mathematical frame- 
work of parameter estimation theory, and we define the 
sense in which a particular strategy can be "optimal." 
Then, in ^ we consider the case of unquantized sig- 
nals and present the corresponding optimal estimators 
for correlation. Next, in SJH we summarize the details 
of the quantization procedure, outline the traditional 
correlation estimators via sample covariance, and derive 
the maximum-likelihood estimate of correlation for quan- 
tized signals. In fjsl we give specific examples for common 
quantization schemes, and compare the performance of 
the maximum-likelihood estimate to that of traditional 
estimates. Then, in ^ we demonstrate the critical prop- 
erty that traditional correlation schemes are optimal for 
small \p\. Finally, in SjTl we summarize our findings and 
discuss the possibilities for implementation. 



2. MATHEMATICAL BACKGROUND 

We begin by reviewing some essential concepts and 
terminology in parameter estimation theory. For a 
comprehensive discussion of these ideas with a rigor- 
ous description of the a ssu mptions and constra i nts, se e 
IKendall fc Stua^ (fTOTOl) or lLehmann fc Casellal (flOOl . 



2.1. Optimal Estimators and Maximum Likelihood 

We first ascribe a precise meaning to the term "opti- 
mal" estimator. For this purpose, we must consider both 
the bias and noise in an estimator. We seek estimates 
of correlation that converge to the exact correlation as 
iV — >■ cxd; such estimates are said to be consistent. We 
refer to a consistent estimator with the minimum noise 
(i.e. the minimum mean squared error) as the optimal 
estimator. 

If the parameters to be estimated correspond to a 
known class of distributions, then a particularly simple 
estimator can be defined. Namely, consider a set of ob- 
servations X drawn from a distribution that is specified 
by a set of parameters 6q. One parameter estimation 
strategy determines the parameters which maximize the 
likelihood function C{9\x), defined as the probability of 
sampling x given the distribution specified by 6. If C 
has a unique maximum at some ^ml, then this point is 
defined to be the maximum-likelihood estimator (MLE) 
of 9q for the sampled points x. 

Often, the sample data x can be greatly reduced to 
some simplified statistic T(x), which carries all the in- 
formation related to the parameters Oq. In this case, 
T(x) is said to be a sufficient statistic for Oq. For ex- 
ample, if samples are drawn from a normal distribution 
with known variance but unknown mean, then the sam- 
ple mean is a sufficient statistic for the mean. The fac- 
torization criterion states that a necessary and sufficient 
condition for T(x) to be sufficient for a family of distribu- 
tions parametrized by Oq is that there exist non-negative 
functions g and h such that P(x; 9o) = g[T{x); 6Q]h{x). 

Under weak regularity conditions, the likelihood func- 
tion also determines the minimum noise that any unbi- 
ased estimator can achieve. This minimum, the Cramer- 
Rao bound (CRB), can be expressed in terms of deriva- 
tives of C For example, the minimum variance of any 
unbiased estimator of a single parameter 6*0 is the in- 



verse of the Fisher information, and can be written 

2, -1 



> 



91n£(x; 




o2 



(1) 



Here, (...) denotes an ensemble average over sets of mea- 
surements X. An unbiased estimator with noise that 
matches the CRB is said to be efficient. 

Under general conditions, the MLE is both consistent 
and asymptotically (as N^oo) efficient. In the present 
work, we present the MLE of correlation for both un- 
quantized and quantized signals, and we compare these 
correlation strategies with traditional schemes. 

2.2. Distribution of Correlated Gaussian Variables 

Astrophysical observations measure zero-mean, Gaus- 
sian noise. Under rather broad assumptions, pairs of 
such samples {x, y} are drawn from a bivariate Gaussian 
distribution. In addition to the respective standard de- 
viations, (Tx = \/ (x^) and CTy = ■\/(y^, this distribution 
depends on the correlation p = (xj/)/((TxO'y) € [—1,1]. 
Because our present emphasis is correlation, we assume 
that (Tx = cTy = 1, in which case the PDF is given by 



1 



2vrv/r 



: exp 



J/2 - 2pxy 



2(1 



(2) 



For small this distribution takes the following ap- 
proximate form: 

P{x,y-p)^^{l+pxy)e--^(^'+y'). (3) 



See Chapter 8 of [Thompson et all (|2001[ ) (hereafter 
TMS) for some additional representations and discussion. 

3. CORRELATION ESTIMATORS FOR 
UNQUANTIZED SIGNALS 

We now analyze several correlation estimators for un- 
quantized signals. These estimators serve two relevant 
purposes: they provide a point of reference to ascribe 
an efficiency to a quantization scheme, and they suggest 
closed-form strategies for correlation estimates of quan- 
tized signals that have a large number of bits. 

First, in !j3.11 we consider the estimate of correla- 
tion via sample covariance, denoted r^o. Next, in !j3.21 
we present Pearson's estimate of correlation, rp, which 
serves as the optimal estimator when no information 
about the signal is known. Last, in H3.31 we give de- 
tails of the MLE of correlation when the signal variances 
are known, which we denote rq. Figure [1] compares the 
asymptotic noise in these three estimates, as given in the 
following sections. 

3.1. Correlation via Sample Covariance: r^c 

The simplest estimate of correlation follows from 
the relationship between correlation and covariance. 
Namely, suppose that the means {Mx,My} and standard 
deviations {ctx, CTy} of the signals Xi and yi are known. In 
this case, the signals may be standardized to have zero 
mean and unit variance. Their covariance is then equal 
to their correlation: (xy) = p. This correspondence im- 
mediately suggests a simple estimator for the correlation: 

Too = {xy)N = N^'^ Y^iLl ^iVi- 
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Figure 1. Asymptotic noise in estimators r^c, Tp, and rq of p 
for unquantized signals. Because the noise is a symmetric function 
of p, only the positive values are shown. 

This estimator is unbiased and consistent but has large 
variance: iV((5r^) = N{{r - pf) = (l + p^) (TMS; Eq. 
8.13). This equation highlights a peculiar feature: Too is 
the noisiest when the correlation is strongest. 

3.2. Optimal Correlation: r^ 

Many researc hers have studie d im proved est i mator s of 
correlation (see lHotellingj ()1953D and lAndersonI (jl996D for 
interesting perspectives). The most common estimator 
is known as "Pearson's r" and is given by 

^ ^ {jx- (x) n) (y - jy) n)) N (-4) 

In addition to being unbiased and consistent, rp is 
asymptotically efficient. The asymptotic noise can be 
derived using the Fisher transformation (jFisheii 119151 

[T921I) : limN^oo N (Sri) = which is indeed 

the CRB. 

Note that substituting the exact means and variances 
into rp returns the original estimate Too and, remark- 
ably, decreases the quality of the estimate. As a simple 
example, three randomly generated samples with correla- 
tion p = 0.999 are {x^} = {0.998, 1.712, -0.992}, {yi} = 
{1.01,2.01,-0.980}. In this case, we obtain estimates 
Too = 1.81, rp — 0.997. Indeed, for perfect correlation, 
6r^ — 0, whereas = 2/A^. Pearson's estimate ac- 
counts for the sample variance, which contributes much 
of the noise in r^o- However, for small correlation, p — ^ 0, 
the noise in r^c and rp is identical (we further discuss this 
feature in 

Hence, when the correlation is large, simply averaging 
products poorly approximates the correlation relative to 
other schemes. Even if the exact variance is known, the 
sample variance must still be incorporated to optimally 
estimate the correlation. 



3.3. Optimal Correlation with Known Signal Variance: 

Nevertheless, an exact knowledge of the variance can 
be used to effectively improve the estimate of correlation. 
In fact, this knowledge is generally assumed in radio as- 
tronomy. For example, automatic gain control usually 
sets the variances (cc^) = (y^) = 1, and quantization 
schemes use the "known" variance to determine the ap- 
propriate level settings. Errors in the signal estimate are 
then a source of both bias and noise, so for non-stationary 
signals such as pulsars, the quantization weight s must 
be dynamically adjusted (see iJenet fc Andersoni ri998'). 
For a more complete discussion of quantization noise, 
seeiGwinn (2004) and Gwinn (2006). 

More generally, whenever the timescale of variation of 
p is shorter than that of variation in the standard devi- 
ation a oi X and y, there will be improved measures of 
correlation. 

For example, if the standard deviations of x and y are 
known to be unity, then the MLE of correlation, denoted 
Tq, is determined by the real solution of 

'^00 (1 + rl) - rq {{x^)n + {y^)N - (1 - rl)) - 0. (5) 

In we derive this result, give an approximate form 
for Tq, and demonstrate that the noise in this estimate 
achieves the CRB as — 00, as expected for an MLE: 

limAT^oo N (Sri) (l - p^) V (l + P^)- The advantage 
of Tq relative to rp increases with |p|, and gives a factor 
of two improvement in the estimator variance at high 
correlation. Moreover, the bias in rq is a (A''"^). 

4. CORRELATION ESTIMATORS FOR 
QUANTIZED SIGNALS 

In practice, data are digitized, which involves quan- 
tization according to a prescribed scheme. The merit 
of the quantization, reduction of data volume, must be 
carefully weighed against its drawback, degraded signal 
information. 

We first review the details of quantization and the tra- 
ditional estimators of correlation, which rely on the sam- 
ple covariance after quantization. We then derive the 
MLEs of correlation for arbitrary quantization schemes 
and give expressions for the noise in these estimators. 

4.1. The Quantization Transfer Function 

The process of quantization maps each element in a 
time-series G M to a set of i = 2^ discrete values: 
Xi XL,i, where b is the number of bits in the quan- 
tization scheme. This transfer function involves L — 1 
thresholds, which partition R into L subsets, and L re- 
spective weights for these subsets. 

4.2. Quantized Correlation via Sample Covariance 

The traditional correlation estimator for quantized sig- 
nals matches the form of the continuous covariance es- 
timat or, r^c, to the qu antized signals f r = (xTyT,)^ 
(■Van Vleck fc Middleton 1966; Cole 1968i: iCooDeim97fll : 
iHagen fc Farlevlll973| ). In some cases, this result is then 
appropriately transformed to account for bias. This 
correlation strategy is optimized through the particular 
thresholds and weights that determine the transfer func- 
tion of the quantization. 
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4.3. The MLE of Correlation for Quantized Signals 

We now derive the MLE of correlation t/^.ml for quan- 
tized signals, which is both consistent and asymptotically 
efBcient. In particular, the likelihood function C for a 
set of N independent and identically distributed (i.i.d.) 
pairs of samples {x^^i, yL,i} drawn from a bivariate nor- 
mal distribution and then quantized in a scheme with L 
levels is 

N 

C{p, cr\{xL,i, yL,i}) = n Pi^L,i,yL,i; p, ct) (6) 

i=l 

lnC{p,a\{xL,i,yL.i}) = y^./V£ln7^^(/9,g). 



In this expression, £ runs over the possible quantized 
pairs {xl, ^l}; Mi is the total number of samples in each 
such category; and Vi{p, cr) corresponds to the probabil- 
ity of a sampled pair {xl, {jl} falling in that category. 

To determine the MLE, this log-likelihood must be 
maximized with respect to p, if a is assumed to be known, 
or with respect to p and <t, if cr is unknown. Although 
we have assumed symmetry = Cy, the generalization 
is straightforward. 

The MLE thus requires an evaluation of each proba- 
bility Vf. 



{p, a} can be written 



Vi^ Si / dxdy P{x, y; p, cr) 

J Re 



(7) 



In this expression, P{x, y; p, cr) is given by Eq.[2l C 
corresponds to the set of unquantized values that map 
to each quantized state, and Si d Z is an optional sym- 
metry factor, to account for the symmetry under inver- 
sion, P{xL,i,yL,i) = P{-XL,i-,-yL,i), and transposition, 
P{xL,i,yL,i) = P(jjL,i,XL,i)- In & few instances, such as 
the quadrant integrals that arise in one-bit correlation, 
Eq. [7] has a simple, closed-form representation. More 
generally, it can be reduced to a one-dimensional inte- 
gral of an error function. 

Thus, in most cases, the MLE requires minimization 
over a function that involves one-dimensional numerical 
integration. However, many strategies can simplify this 
estimation. For example, if both the number of samples 
N and quantization bits h are small, then all required 
solutions can be tabulated. After including the symme- 
try reductions, the number of distinct correlation possi- 
bilities is Ni = 2''-i (1 + 2^-^). The total number M 
of partitions of N samples into these categories is then 
M= (^^,^]"^) N^^-^/{Ni-l)\. If Mis prohibitively 
large, then the N samples can first be partitioned and 
then the respective correlation estimates averaged to ob- 
tain an approximation of the MLE. 



4.4. Noise in the MLE and the Cramer-Rao 
Lower-Bound 

As we have already mentioned, the CRB determines 
the minimum variance that any unbiased estimator of p 
can achieve. In terms of the likelihood function of !j4.3[ 
the elements of the 2x2 Fisher information matrix for 



-^AA.lnT', 



(dr\' 

4" 



(8) 



da 



f. ' 

If a is known, then the minimum variance of an unbiased 
estimator of p is 6r\ = ; if cr is unknown, then the 

minimum variance is Sr\ — 12,2/ (2^i,i22,2 ~ 2) ■ 
The MLE is asymptotically efficient, so {5r\ 



Srj^ CR as — ?> 00. 



5. EXAMPLES 



5.1. One-bit Quantization 

In the standard one-bit, or two-level, quantization 
scheme, each sample is reduced to one "sign" bit: x 1-^ 
X2 = sign(x). Because the sample error for the signal 
variance incurs the bulk of the noise in r^c , quantization 
actually improves upon the estimate of r^o in some cases. 

Explicitly, we have r2 = {xy)j^. However, this es- 
timate is biased: (r2) = 27r^^sin~^p. The standard 
Van Vleck clipping correction, denoted ?'2,Vj improves 
the bias to 0(1/N) by simply inverting this relationship 
(|Van Vleck fc MiddletonI [19661) : 



f2,V 



sm 



ih) ■ 



(9) 



In fact, r2,v gives precisely the MLE. To see this, 
note that the quantized products, xy, have probability 
P(±l) = i ± iarcsinp. Minimizing the log-likelihood 
(Eq. ini with respect to p gives that r2,ML = ?'2,v- 

Because r2y is the MLE, the noise for large N is given 
by the CRB.' Substituting P(±l) into Eq.[S]gives 



NSr^ 



2,CR 



(I) - (arcsinp)^ 



(10) 



We can easily verify that the noise in r2.v actually 
achieves this lower bound. Namely, the correlation esti- 
mate r2 is a one-dimensional random walk with N steps 
of length ±1/A^, distributed according to P(±l). For 
large N, the central limit theorem gives that r2 follows a 
Gaussian distribution with mean 2Tr~^ arcsinp and vari- 



ance N 



1 - (27r- 



arcsm p) 



rly) = -{l-{l-2p' 



cxp 



In this limit, we obtain 



4(arcsin p)"^ — tt^ 



2A^ 



'-) -(arcsinp)^ 



(11) 

which is identical to the CRB. 

The most striking improvement of r2.v relative to r^o 
occurs as p — > ±1; in this limit, the one-bit correlation 
has no noise, while ((5r^) = 2/A^. 
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5.2. Two-bit Quantization 

Perhaps the most common quantization strategy re- 
places each sample by a pair of bits for sign and magni- 
tude. The (non-zero) thresholds ±t;o are fixed at some 
level relative to the estimated RMS signal voltage ct in a 
way that minimizes the expected RMS noise in the sub- 
sequent correlation estimates. The resulting four levels 
are then assigned weights X2 G {±1, in}, where n is also 
chosen to minimize the noise. In terms of the mean quan- 
tized pro duct r 4 = {xAVAjNi one obtains the correlation 
estimate (|TMSI : Eq. 8.43) 

r,^ = - $^erff^V (12) 

$-fn2(l-$)' \aV2) ^ ' 

This estimate of correlation, which already assumes ex- 
act knowledge of cr, retai ns a significant (^10%) b ias to 
high IpI (see Figure 1 of iJenet fc AndersonI ()1998[ )). If 
IpI < 0.8, for instance, then the appropriate correction is 
simply a constant scaling factor (TMS : Eq. 8.45): 

^ / 7r[j> + n2(l-j>)] | 
\2[{n-l)E+lf j 

(13) 

For additional det ails and a com plete formulation to re- 
move the bias, see IGwinnI ([2001 . Here, we use the "V" 
subscript to draw analogy with the Van Vleck correction 
for one-bit correlation. Namely, this estimate calculates 
the sample covariance after quantization and then per- 
forms a bias correction according to the estimated corre- 
lation. The remaining bias is 0{1/N). 

Researchers then optimize this two-bit correlation 
scheme by a particular choice of thresholds and weights: 
vq ~ 0.9815, n ~ 3.3359. However, unlike one-bit cor- 
relation, the bias-corrected quantized product r^y is 
not the optimal estimator of correlation for quantized 
data. In particular, r^y even refiects the disturbing fea- 
ture of the continuous estimate Too that the noise tends 
to increase with \p\. Hence, high correlations present 
the paradoxical situation in which traditional estimates 
{'"oo, ?'4,Vj ''2.v} perform better as the number of bits is 
reduced. 

This troubling evolution merely reflects the incom- 
pleteness of these correlation estimates. Figure [2] com- 
pares the noise in r4y to the noise in the MLE, both 
when a is known and unknown. Each MLE has negligi- 
ble bias and noise that reflects the behavior seen in the 
corresponding unquantized MLE, rp or rq. The noise is 
always lower than that of r2y and approaches zero as 
\p\ — >■ 1. We therefore resolve the puzzling evolution of 
correlation noise after quantization. 

The only remaining barrier is the computational dif- 
ficulty of implementation. However, for small values of 
N, the maximum-likelihood solutions may be tabulated 
prior to calculation; the required number of tabulated 
values is M = (^+''^) - N^/5l (see MM- Alternatively, 
one can first partition the N samples, then calculate the 
MLE of correlation for each subset via tabulation, and 
finally average the results. We defer a comprehensive 
treatment of these implementation strategies to a future 
work. 

5.3. Many-bit Quantization 



' ' ' I ' ' ' I ' ' ' I ' ' ' I ' ' ' 




Correlation p 

Figure 2. Noise in estimates of correlation for signals quantized 
with two bits. The chosen levels {vq = 0.9815) and weights (n = 
3.3359) are optimal as \p\ — > 0. The upper curve gives the noise 
in the traditi onal estimator via sample covariance, as derived in 
IGwinnI l|2004f ) , whereas the lower curves give the noise in the MLEs 
with and without knowledge of a. 

Modern instrumentation now permits the storage of 
baseband data with many-bit quantization schemes. In 
this case, the noise in the MLE of correlation rapidly 
approaches that in the corresponding unquantized limit, 
Tp or Tq (see Figure [3]). In such cases, these estima- 
tors for unquantized signals provide excellent approxi- 
mations of the quantized MLEs, and the primary con- 
cerns are the influence of RFI and instrumental limita- 
tions (TMS). Furthermore, although low-bit quantiza- 
tion schemes are quite robust to impulsive RFI, estimates 
such as Tp are not, so alternative quantization schemes 
that are r obust at the ex pense of increased noise may be 
preferred (jFridmanl 120091 ). 

We now consider the incurred bias when approximat- 
ing the quantized MLE by rp. Speciflcally, consider a 
high-bit scheme with L levels, thresholds in multiples of 
±vo, and quantization weights x that are the average val- 
ues of their respective preimages. We denote the corre- 
sponding estimator r^ p. Then, if the highest thresholds 
extend far into the tail of the distribution, the bias after 
quantization is approximately 

.....-^(^)%. (») 

For more general expressions, which include the effects 
of the finite o uter thresholds, consult the discussion in 
§8.3 of ITMSI . While correcting the bias is straightfor- 
ward, even for a low number of bits, this strategy is inef- 
fective for low-bit schemes because r^^p is not a sufficient 
statistic for p. 

6. REDUCTIONS FOR SMALL CORRELATION 

Although the MLE of correlation decreases the noise 
for large it exhibits identical noise to traditional es- 
timators at small \p\. Furthermore, in this limit, knowl- 
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Figure 3. Noise in thie MLE of correlation for quantized signals as 
a function of correlation for various quantization levels. The upper 
panel shows the noise when a is known, and the lower panel shows 
the additional noise when a is unknown. For simplicity, we set 
the (L — 1) quantization thresholds in multiples of ±4/L. Observe 
that the reduction in noise provided by knowledge of a becomes 
more pronounced as the number of levels is increased. For example, 
knowledge of a provides no improvement for one-bit correlation. 



edge of cr does not reduce the noise. These features both 
arise from the form of the bivariate Gaussian PDF in this 
limit. 

For example, consider the estimation of correlation for 
unquantized signals when CTx = = 1 is known. Then, 
if IpI ^ 1, the joint PDF of N independently-drawn pairs 
of correlated random variables {xi,yi} is (see Eq. [3]) 



P{{x^,y,};p) 



1 



(2^) 



N 



[I + Np{xy)N) e-^«="'>"+<^'>"). 

(15) 



Thus, from the factorization criterion, Too = (xy) at is a 
sufficient statistic for p, and so we expect the asymptotic 
noise in Too to match that of Tq as p — ^ 0. 

Likewise, consider the joint distribution of the samples 
after quantization into L weighted levels. In this case, 
we require the set of quantized probabilities 



Vi ~ — I dxdy (1 + pxy) e 
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Re 



The joint PDF of the quantized samples is then 
P(fe,yL,.};p) = n^^ 



Pip- 
(16) 

(17) 
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Hence, for small |p|, the factorization criterion gives that 
{w(x,y)) is a sufficient statistic for p, if the weight func- 



w{x,y) 



Jj^^ dxdy xye ^ (^^+y^) 
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where Ri^x C M defines the range of values spanned by 
each quantized level. 

Moreover, the final factorization in Eq. [18] demon- 
strates that, by assigning an appropriate weight to each 
quantization level: w{x,y) = xy, the sample covariance 
is a sufficient statistic for p and will achieve optimal noise 
performance as \p\ — > 0. 

The asymptotic noise in this limit is then the CRB: 



5r 




xye 



J dxdy ( 



(19) 

Minimizing this equation yields the optimal thresh- 
olds. Then, Eq. [18] immediately determines the opti- 
mal weights. Observe that these weights are slightly 
different than tho s e of some previous works, such as 
IJenet fc Anderson] ([1998D . but match the ratios of tra- 
ditional quantization schemes, such as n = 3.336 when 
Wo = 0.982 for two-bit correlation, for instance. 

Finally, Ii,2 — >■ as p — >■ 0. This result follows easily 
by substituting Vi and its derivatives into Eq. [5] Hence, 
the CRB is unchanged by knowledge of a in this limit. 

7. SUMMARY 

We have explored the paradoxical scaling of noise in 
traditional estimates of correlation for quantized signals. 
In particular, we have shown that the decrease in noise 
that quantization affords is a result of an incomplete com- 
parison with unquantized correlation schemes and of sub- 
optimal correlation strategies for quantized signals. 

We have derived the MLE of correlation, both with 
and without knowledge of the signal variance and quan- 
tization, and we have compared these estimates to tra- 
ditional schemes. The MLE has negligible bias, lower 
noise, and is asymptotically efficient: for a large num- 
ber of samples, no other unbiased scheme will achieve 
lower noise. We have also derived simple expressions for 
this asymptotic noise (the CRB). While the MLE gives 
the familiar Van-Vleck corrected sample covariance for 
one-bit quantization, it differs significantly from current 
correlation strategies for all other cases. 

Nevertheless, traditional correlation schemes are fully 
optimized in the limit p — ^ 0. Namely, for suitably chosen 
weights, the sample covariance f/, is a sufficient statistic 
for the correlation p, in this limit. 

Future detectors, such as the Square-Kilometer Array, 
that will achieve high signal-to-noise while being limited 
to a small number of quantization bits, can benefit from 
these novel correlation strategies to reduce both the dis- 
tortion and noise introduced by quantization. 
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APPENDIX 

A. MLE FOR UNQUANTIZED SIGNALS WITH KNOWN VARIANCE 

We now summarize the main features of the MLE of correlation for sample s {xi, yi} drawn from a bivariate Gaussian 
distribution with known means and variances. See IKendall fc StuartI (|1979f ) for additional details. For simplicity, we 
assume that the means are zero and the variances are unity. We also assume that each pair is drawn independently. 
The likelihood function is then 



N 



1 



: CXp 



1 



N 



The condition for the likelihood function to be extremized is 

„2\ „/„2 I „2 /^ 2 



2(l-p2) 



(Al) 



(A2) 



where = (a;^)jv, Sy = {y'^)N, and = {xy)j^. Hence, the triplet {rocSxjSy} is sufficient for p. We will denote the 
appropriate solution to this cubic equation Tq. 
To obtain some intuition for this result, let e = s'^ + s?, — 2. Then (e^) = 4(1 + p^) /N . The discriminant of the cubic 



IS 



A = -4r^ + (e^ + 20e - 8) - 4 (1 + e)^ . 



(A3) 



If A < 0, then the cubic has a single real solution. As a rough rule of thumb, we can simply consider when all terms 
are negative. Since 6e ~ 2/y/N , we see that there is likely a unique real solution whenever e < .39, or TV > 25. 

Although finding this solution is both analytically and numerically straightforward, an approximation is both useful 
and enlightening: 



1 - 
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This expansion immediately identifies the appropriate root of the cubic equation. Furthermore, we can determine the 
asymptotic noise for rq by expanding Eg. IA4l for large N: 



{6rl) = i^-V) + 



P 



2p 



(A5) 



A straightforward application of Isserlis' Theorem (|Isserlislll918l) gives that (Sr^^) = (1 + p'^)/N, (e^) = 4 (l + p^) /N, 
and (Jroof) = 4p/iV. Putting everything together, we obtain 



lim N(6rl) 

We can easily verify that this result is equal to the CRB: 
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